Project Description¶
The Nobel Prize is one of the most prestigious international awards and has been given annually since 1901. It honors outstanding achievements in fields such as chemistry, literature, physics, physiology or medicine, economics, and peace. Along with the honor and prestige, recipients receive a significant prize amount and a gold medal featuring the image of Alfred Nobel (1833–1896), the founder of the prize.
For this project, we will work with a dataset provided by the Nobel Foundation that contains information about all Nobel Prize winners from 1901 to 2023. The data comes from the Nobel Prize API and is saved in the file named nobel.csv
.
Our goal is to explore this rich dataset and answer several interesting questions about the history of the Nobel Prize. For example, are there any patterns or biases in how the awards have been given over time? By using our data manipulation and visualization skills, we will analyze the data to uncover insights and better understand the trends behind this famous prize.
This project will give us a chance to practice working with real-world historical data while learning about one of the most respected awards in the world.
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
# Read in the Nobel Prize data
nobel = pd.read_csv('nobel.csv')
nobel.head()
year | category | prize | motivation | prize_share | laureate_id | laureate_type | full_name | birth_date | birth_city | birth_country | sex | organization_name | organization_city | organization_country | death_date | death_city | death_country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | 160 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Male | Berlin University | Berlin | Germany | 1911-03-01 | Berlin | Germany |
1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | 569 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | Male | NaN | NaN | NaN | 1907-09-07 | Châtenay | France |
2 | 1901 | Medicine | The Nobel Prize in Physiology or Medicine 1901 | "for his work on serum therapy, especially its... | 1/1 | 293 | Individual | Emil Adolf von Behring | 1854-03-15 | Hansdorf (Lawice) | Prussia (Poland) | Male | Marburg University | Marburg | Germany | 1917-03-31 | Marburg | Germany |
3 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | 462 | Individual | Jean Henry Dunant | 1828-05-08 | Geneva | Switzerland | Male | NaN | NaN | NaN | 1910-10-30 | Heiden | Switzerland |
4 | 1901 | Peace | The Nobel Peace Prize 1901 | NaN | 1/2 | 463 | Individual | Frédéric Passy | 1822-05-20 | Paris | France | Male | NaN | NaN | NaN | 1912-06-12 | Paris | France |
What is the most commonly awarded gender and birth country?¶
# Store and display the most commonly awarded gender and birth country
top_gender = nobel['sex'].value_counts().index[0]
top_country = nobel['birth_country'].value_counts().index[0]
print("\n The gender with the most Nobel laureates is :", top_gender)
print(" The most common birth country of Nobel laureates is :", top_country)
The gender with the most Nobel laureates is : Male The most common birth country of Nobel laureates is : United States of America
Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?¶
# Calculate the proportion of USA born winners per decade
nobel['usa_born_winner'] = nobel['birth_country'] == 'United States of America'
nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)
prop_usa_winners = nobel.groupby('decade', as_index=False)['usa_born_winner'].mean()
prop_usa_winners.sort_values(by='usa_born_winner', ascending = False)
decade | usa_born_winner | |
---|---|---|
10 | 2000 | 0.422764 |
9 | 1990 | 0.403846 |
12 | 2020 | 0.360000 |
8 | 1980 | 0.319588 |
7 | 1970 | 0.317308 |
11 | 2010 | 0.314050 |
4 | 1940 | 0.302326 |
5 | 1950 | 0.291667 |
6 | 1960 | 0.265823 |
3 | 1930 | 0.250000 |
1 | 1910 | 0.075000 |
2 | 1920 | 0.074074 |
0 | 1900 | 0.017544 |
# Decade with the highest proportion of US-born winners
max_decade_usa = prop_usa_winners[prop_usa_winners['usa_born_winner'] == prop_usa_winners['usa_born_winner'].max()]['decade'].values[0]
# Plotting USA born winners
ax1 = sns.relplot(x='decade', y='usa_born_winner', data=prop_usa_winners, kind="line")
Which decade and Nobel Prize category combination had the highest proportion of female laureates?¶
# Proportion of female laureates per decade
nobel['female_winner'] = nobel['sex'] == 'Female'
prop_female_winners = nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()
# Find the decade and category with the highest proportion of female laureates
max_female_decade_category = prop_female_winners[prop_female_winners['female_winner'] == prop_female_winners['female_winner'].max()][['decade', 'category']]
# Create a dictionary with the decade and category pair
max_female_dict = {max_female_decade_category['decade'].values[0]: max_female_decade_category['category'].values[0]}
# Plotting female winners
ax2 = sns.relplot(x='decade', y='female_winner', hue='category', data=prop_female_winners, kind="line")
Who was the first woman to receive a Nobel Prize, and in what category?¶
# The first woman to win a Nobel Prize
nobel_women = nobel[nobel['female_winner']]
min_row = nobel_women[nobel_women['year'] == nobel_women['year'].min()]
first_woman_name = min_row['full_name'].values[0]
first_woman_category = min_row['category'].values[0]
print(f"\n The first woman to win a Nobel Prize was {first_woman_name}, in the category of {first_woman_category}.")
The first woman to win a Nobel Prize was Marie Curie, née Sklodowska, in the category of Physics.
Which individuals or organizations have won more than one Nobel Prize throughout the years?¶
# The laureates that have received 2 or more prizes
counts = nobel['full_name'].value_counts()
repeats = counts[counts >= 2].index
repeat_list = list(repeats)
print("\n The repeat winners are :", repeat_list)
The repeat winners are : ['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']